Search CORE

10 research outputs found

Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization

Author: Cao Yunbo
Dai Damai
Lin Binghuai
Liu Tianyu
Sui Zhifang
Tong Shoujie
Xia Heming
Xu Runxin
Publication venue
Publication date: 22/10/2023
Field of study

Pretrained language models have achieved remarkable success in natural language understanding. However, fine-tuning pretrained models on limited training data tends to overfit and thus diminish performance. This paper presents Bi-Drop, a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets dynamically generated by dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner, so it overcomes the problem of hysteresis in sub-net updating, which is possessed by previous methods that perform asynchronous sub-net estimation. Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data. Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods. Furthermore, empirical results also show that Bi-Drop exhibits excellent generalization ability and robustness for domain transfer, data imbalance, and low-resource scenarios.Comment: EMNLP 2023 Findings. Camera-ready version. Co-first authors with equal contribution

arXiv.org e-Print Archive

A Survey on In-context Learning

Author: Chang Baobao
Dai Damai
Dong Qingxiu
Li Lei
Li Lei
Sui Zhifang
Sun Xu
Wu Zhiyong
Xu Jingjing
Zheng Ce
Publication venue
Publication date: 01/06/2023
Field of study

With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few examples. It has been a new trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, demonstration designing strategies, as well as related analysis. Finally, we discuss the challenges of ICL and provide potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL.Comment: Papers collected until 2023/05/2

arXiv.org e-Print Archive

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

Author: Dai Damai
Dong Li
Hao Yaru
Ma Shuming
Sui Zhifang
Sun Yutao
Wei Furu
Publication venue
Publication date: 15/05/2023
Field of study

Large pretrained language models have shown surprising in-context learning (ICL) ability. With a few demonstration input-label pairs, they can predict the label for an unseen input without parameter updates. Despite the great success in performance, its working mechanism still remains an open question. In this paper, we explain language models as meta-optimizers and understand in-context learning as implicit finetuning. Theoretically, we figure out that Transformer attention has a dual form of gradient descent. On top of it, we understand ICL as follows: GPT first produces meta-gradients according to the demonstration examples, and then these meta-gradients are applied to the original GPT to build an ICL model. We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding. Experimental results show that in-context learning behaves similarly to explicit finetuning from multiple perspectives. Inspired by the dual form between Transformer attention and gradient descent, we design a momentum-based attention by analogy with gradient descent with momentum. The improved performance over vanilla attention further supports our understanding from another perspective, and more importantly, shows the potential to utilize our understanding for future model design. The code is available at \url{https://aka.ms/icl}.Comment: Accepted to ACL 2023 finding

arXiv.org e-Print Archive